fix(ai/recipes): declare max_batch_tokens on google embedding recipe#1016
Open
ChenyqThu wants to merge 1 commit into
Open
fix(ai/recipes): declare max_batch_tokens on google embedding recipe#1016ChenyqThu wants to merge 1 commit into
ChenyqThu wants to merge 1 commit into
Conversation
google was the only first-party embedding recipe still missing max_batch_tokens after v0.32 garrytan#779 landed the once-per-process startup warning. Operators routing through google:gemini-embedding-001 (the default-provider path after v0.27 native gateway) saw the warning on every `gbrain query`, every kos-compat-api / MCP `/ingest` response, and every cron `gbrain` invocation. For CJK-dense or large-payload batches the absent field also forced the gateway to discover Google's per-request token cap reactively via recursive halving instead of pre-splitting. Declared: - max_batch_tokens: 20_000 — Google's per-text cap is 2048 tokens; ~20k tokens/request is the soft cap before gemini-embedding-001 starts emitting 429s. - chars_per_token: 2 — CJK density on mixed corpora (English averages ~4, CJK ~1.5; 2 keeps pre-split safe for both). - safety_factor left at gateway default 0.8 → pre-split lands at ~8 000 chars/batch, well under any per-request floor Google publishes. Two existing regression tests pinned google as the canary "real provider with no cap declared": - test/ai/no-batch-cap-suppression.serial.test.ts assumed google STILL warned (the comment explicitly called it a fixed-cap model waiting for someone to cap it). With this patch google joins the capped set, so the test flips to assert the strong invariant: NO first-party recipe warns, because every native and openai-compat recipe now declares either max_batch_tokens or no_batch_cap. - test/ai/adaptive-embed-batch.test.ts checked `contractMatch.length > 0`. After this patch the canary set is empty, so `toBe(0)`. The once-per-process suppression mechanism is still exercised by the `firstCallCount` stability check earlier in the same test. Validation: - bun run typecheck clean - bun test test/ai/ — 144 pass / 0 fail (was 142 pass / 2 fail pre-patch, expected: the two tests above)
ChenyqThu
pushed a commit
to ChenyqThu/jarvis-knowledge-os-v2
that referenced
this pull request
May 15, 2026
PR garrytan#1016 (upstream-fix/google-recipe-max-batch-tokens) also touches two regression tests that pinned google as the canary "real provider with no cap declared". Without backporting them to fork master, `bun test test/ai/` would 2-fail on this tree even though google.ts has been capped here since af2a806. - no-batch-cap-suppression.serial.test.ts: replaces the "STILL warns for google" test with a stronger invariant — NO first-party recipe warns, because every native/openai-compat recipe now declares either max_batch_tokens or no_batch_cap. - adaptive-embed-batch.test.ts: inverts `contractMatch.length > 0` to `=== 0` and adds google to the "must be absent from warnings" assertion list (sibling to voyage + openai). Validation: `bun test test/ai/` → 144 pass / 0 fail. Also: TODO.md updates link PR-1016 + PR-1017 from the related entries; PR-2 oauth-bootstrap entry flips to FILED with the same patch metadata. On upstream merge of either PR, the next sync auto-drops these local edits via clean text merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ChenyqThu
pushed a commit
to ChenyqThu/jarvis-knowledge-os-v2
that referenced
this pull request
May 15, 2026
JARVIS-ARCHITECTURE.md §6.25 records the 2026-05-15 follow-up session: diagnostic correction (the historical 117k stderr "fire" that wasn't), two upstream PRs filed (garrytan#1016 google max_batch_tokens, garrytan#1017 oauth_clients forward-bootstrap), CJK keyword-only eval verdict, overlap-matrix verdict, M2-B + M2-C verdicts, mechanical cleanup (48 sync_failures ack'd, bun-test hang root-caused), and the new [facts:absorb] sub-process DB-connection latent bug. 8 TODO items closed, 1 new entry filed. CLAUDE.md "Chinese-first knowledge base" rule tightened per the CJK-eval verdict: the strict invariant is *compound CJK (4+ Han chars without whitespace) requires vector*; English and 2-3 char standalone CJK match fine on keyword. Operational guidance unchanged — the modal operator query on this brain is a compound CJK phrase that depends on vector being live. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
ChenyqThu
pushed a commit
to ChenyqThu/jarvis-knowledge-os-v2
that referenced
this pull request
May 20, 2026
v0.37.0.0 upstream sync (11 versions, 12 commits, v0.35.6.0 → v0.37.0.0). Conflicts (4) resolved: - CLAUDE.md: keep fork (upstream content lives in docs/CLAUDE-UPSTREAM.md; selected new sections will be ported in Phase 7). - llms-full.txt: take upstream then bun run build:llms in Phase 2. - test/ai/adaptive-embed-batch.test.ts: take upstream — v0.36.1.1 garrytan#1083 cherry-pick superseded our PR garrytan#1016 backport tests. - test/ai/no-batch-cap-suppression.serial.test.ts: take upstream — same. Auto-merged cleanly: VERSION (→ 0.37.0.0), CHANGELOG.md, README.md, package.json, bun.lock, .gitignore, skills/RESOLVER.md (fork ## KOS-Jarvis extensions section preserved at L128, upstream skillpack-harvest entry added at L62), skills/manifest.json (47 total, fork 4 KOS-Jarvis entries preserved), src/core/pglite-engine.ts (WAL fork patch survived at L207 — line shift from L200 due to upstream import additions). Fork-protected paths (skills/kos-jarvis/, server/, workers/, scripts/launchd/) untouched — verified via diff master..HEAD. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ChenyqThu
pushed a commit
to ChenyqThu/jarvis-knowledge-os-v2
that referenced
this pull request
May 20, 2026
Phase 1 conflict resolution took upstream versions of these two tests assuming v0.36.1.1 garrytan#1083 cherry-pick was the equivalent of fork PR garrytan#1016. Wrong: v0.36.1.1 went with the "warn only for the configured embedding provider" filter path (gateway.ts), while fork kept the "declare max_batch_tokens on google recipe" path (google.ts +7 lines, retained through merge). Net effect of both paths together: - gateway.ts warning filter narrows to configured provider (upstream) - google.ts has max_batch_tokens declared so won't warn anyway (fork) Fork tests expect 0 warnings; upstream tests expect 1 (google when google configured). Took fork view back — google.ts business value (pre-split CJK batches via max_batch_tokens=20_000 + chars_per_token=2, reduces reactive token-cap discovery overhead) justifies retaining the fork patch. Verification: bun test test/ai/ → 224 pass / 0 fail / 764 expect(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ChenyqThu
pushed a commit
to ChenyqThu/jarvis-knowledge-os-v2
that referenced
this pull request
May 20, 2026
JARVIS-ARCHITECTURE.md §6.29: - 11 versions overview (v0.35.6.0 → v0.37.0.0, 12 commits, 333 files) - 4-conflict resolution log (test/ai/ reverted to fork after finding v0.36.1.1 garrytan#1083 went a different fix path from PR garrytan#1016) - Schema v66 → v78 migration evidence (12 migrations auto-bootstrap) - ZeroEntropy lock procedure (config set ze_switch_declined_at + ze_switch_prompt_shown=true since CLI has no --decline flag) - Health score 70 → 80 evidence + new v0.36.x checks all PASS - Smoke evidence (typecheck/test/curl/query Chinese+English/patrol) CLAUDE.md: Latest sync story pointer updated to §6.29. TODO.md: - Top intro replaced with v0.37.0.0 sync context - L824 PR garrytan#1016 verdict revised — NOT superseded by v0.36.1.1 - Added 5 new P1 entries (post-v0.37.0.0): - dream-cycle LLM spend observation (calibration phases) - embedding_columns registry declare (optional) - image ingestion roadmap decision - doctor --remediate vs kos-patrol evaluation pointer - typed-claim fence for OH transcripts - Branch cleanup deferred to Lucien (worktree refs block delete) llms-full.txt regenerated via bun run build:llms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ChenyqThu
pushed a commit
to ChenyqThu/jarvis-knowledge-os-v2
that referenced
this pull request
May 20, 2026
…12 commits, v0.35.6.0 → v0.37.0.0) 333 files / +52455 / -3317 LoC. 4 conflicts resolved (CLAUDE.md / llms-full.txt / 2 test/ai/ files — last 2 reverted to fork view after discovering v0.36.1.1 garrytan#1083 went a different fix path from PR garrytan#1016). Schema v66 → v78 auto-bootstrap (12 migrations, zero manual ALTER). WAL fork patch survived at src/core/pglite-engine.ts:207. ZeroEntropy switch lock applied (ze_switch_declined_at + ze_switch_prompt_shown). Doctor health_score 70 → 80; 3140 pages preserved; vector(1536) unchanged. Smoke: bun test test/ai/ 224/224 green. kos.chenge.ink/health returns 0.37.0.0 + engine=postgres. kos-patrol kickstart exit 0, dashboard written. Chinese + English query smoke top-hits in healthy band. Fork dirs unchanged (10). Branch cleanup deferred (worktree refs). PR garrytan#1016 verdict revised: NOT superseded — upstream filter path co-exists with fork declare-max_batch_tokens path. Full story: docs/JARVIS-ARCHITECTURE.md §6.29. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
googleis the only first-party embedding recipe still missingmax_batch_tokensafter v0.32 (#779) landed the once-per-process startup warning. Operators routing throughgoogle:gemini-embedding-001(the default-provider path after v0.27 native gateway) see the warning on everygbrain query, every kos-compat-api/ingestresponse, and every crongbraininvocation:For CJK-dense or large-payload batches the absent field also forces the gateway to discover Google's per-request token cap reactively via recursive halving instead of pre-splitting — one wasted HTTP round-trip per oversized batch.
The existing test
test/ai/no-batch-cap-suppression.serial.test.tseven explicitly pins this state with the comment "google should warn (it has fixed-cap models)", framing it as a known TODO.Patch
Three field additions to
google.touchpoints.embedding:Rationale
max_batch_tokens: 20_000— Google's documented per-text cap is 2 048 tokens; ~20 k tokens per request is the soft cap beforegemini-embedding-001starts emitting 429s.chars_per_token: 2— English averages ~4 chars/token (OpenAI default), CJK ~1.5. Picking 2 reflects realistic density on mixed-corpora brains so pre-split stays safe on CJK-heavy payloads without over-shrinking on English.safety_factor— left at gateway default 0.8. Pre-split lands at20 000 × 0.8 / 2 = 8 000chars/batch.Test follow-through
Two regression tests pinned
googleas the canary "real provider with no cap declared":test/ai/no-batch-cap-suppression.serial.test.ts: assumed google STILL warned. With google capped, the test flips to assert the stronger invariant — no first-party recipe warns, because every native/openai-compat recipe now declares eithermax_batch_tokensorno_batch_cap.test/ai/adaptive-embed-batch.test.ts: checkedcontractMatch.length > 0. After this patch the canary set is empty →toBe(0). The once-per-process suppression mechanism is still exercised by thefirstCallCountstability check earlier in the same test.Both tests still gate the suppression machinery; they just no longer require any first-party recipe to be the canary. If maintainers prefer a synthetic test-only uncapped recipe to keep a positive "warning fires when expected" assertion, happy to add one in a follow-up.
Validation
bun run typecheckcleanbun test test/ai/— 144 pass / 0 fail (was 142 pass / 2 fail pre-patch, expected: the two tests above)Field repro
Filed downstream from the Jarvis KOS v2 fork (
ChenyqThu/jarvis-knowledge-os-v2), where the google native gateway is the default embedding provider since the v0.27 M3 cutover. The warning has been load-bearing log noise for ~6 weeks across kos-compat-api responses, cron logs, and operator queries. Same pattern as #627 / upstream fixwave #682+#741 (forward-bootstrap mcp_request_log) — small fork-local patch carried while upstream evaluates.🤖 Generated with Claude Code